智能论文笔记

PhishMatch: A Layered Approach for Effective Detection of Phishing URLs

Harshal Tupsamudre , Sparsh Jain , Sachin Lodha

分类：机器学习

2021-12-04

网络钓鱼袭击在互联网上继续成为一个重大威胁。先前的研究表明，可以确定网站是否是网络钓鱼，也可以更仔细地分析其URL。基于URL的方法的一个主要优点是它即使在浏览器中呈现网页之前，它也可以识别网络钓鱼网站，从而避免了其他潜在问题，例如加密和驾驶下载。但是，传统的基于URL的方法有它们的局限性。基于黑名单的方法容易出现零小时网络钓鱼攻击，基于先进的机器学习方法消耗高资源，而其他方法将URL发送到远程服务器，损害用户的隐私。在本文中，我们提出了一个分层的防护防御，PhishMatch，这是强大，准确，廉价和客户端的。我们设计一种节省空间高效的AHO-Corasick算法，用于精确串联匹配和基于N-GRAM的索引技术，用于匹配的近似字符串，以检测网络钓鱼URL中的各种弧度标准技术。为了减少误报，我们使用全球白名单和个性化用户白名单。我们还确定访问URL的上下文并使用该信息更准确地对输入URL进行分类。 PhishMatch的最后一个组成部分涉及机器学习模型和受控搜索引擎查询以对URL进行分类。发现针对Chrome浏览器开发的PhishMatch的原型插件，是快速轻便的。我们的评价表明，PhishMatch既有效又有效。

translated by 谷歌翻译

We study the expressibility and learnability of convex optimization solution functions and their multi-layer architectural extension. The main results are: \emph{(1)} the class of solution functions of linear programming (LP) and quadratic programming (QP) is a universal approximant for the $C^k$ smooth model class or some restricted Sobolev space, and we characterize the rate-distortion, \emph{(2)} the approximation power is investigated through a viewpoint of regression error, where information about the target function is provided in terms of data observations, \emph{(3)} compositionality in the form of a deep architecture with optimization as a layer is shown to reconstruct some basic functions used in numerical analysis without error, which implies that \emph{(4)} a substantial reduction in rate-distortion can be achieved with a universal network architecture, and \emph{(5)} we discuss the statistical bounds of empirical covering numbers for LP/QP, as well as a generic optimization problem (possibly nonconvex) by exploiting tame geometry. Our results provide the \emph{first rigorous analysis of the approximation and learning-theoretic properties of solution functions} with implications for algorithmic design and performance guarantees.

translated by 谷歌翻译